An Example of Hirearchical Ordination

This document describes fitting a hierarchical ordination to data, including code and the nasty details that are needed.

The data is included in the ‘mvabund’ ‘R’-package, but was originally collected by Gibb et al. It is the observed abundances of 41 ant species at 30 sites in Australia, along with 7 environmental variables, and 5 traits. Our question is how do the environmental variables and traits affect the community composition, including whether they interact.

The abundances, traits and envirinment are each stored in a different matrix. First we load the data and set some constants:

library(nimble)
library(nimbleHMC)
library(mvabund)
data("antTraits")
Y <- antTraits$abund # abundance data of sites by species
X <- scale(antTraits$env) # environmental variables of sites by predictors
TR <- scale(model.matrix(~0+.,antTraits$traits)) # species traits

NSites <- nrow(Y) # number of sites
NSpecies <- ncol(Y) # number of species
NTraits <- ncol(TR) # number of traits
NEnv <- ncol(X) # number of environmental predictors

# Create data lists for Nimble
dat <- list(Y = Y, X = X, TR = TR)
consts <- list(NSites = NSites, NEnv = NEnv, NTraits = NTraits, NSpecies = NSpecies)

The Model

We assume that the counts follow a Poisson distribution with a log link function (as we would do in a standard GLM). We assume a species effect (\(\beta_{0j}\) for species \(j\)), and the rest is the hierarchical ordination, so the model for the log of the mean abundance (\(\eta_{ij}\)) is

\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j. \] The idea here is that \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\) are the site scores and species loading, equivalent to a typical ordination. But here we assume that they each have a variance of 1, so that \(\boldsymbol{\Sigma}\) is the variance of the latent variables: it will typically be a diagonal matrix, and if any of the terms on the diagonal are close to 0, this suggests that latent variable has (almost) no effect.

This is the same as a classical GLLVM, but here we go further by modelling both \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\). As a simplification, we can think about this as simply a regression of \(\boldsymbol{z}\) (the site effects) against the environmental variables and \(\boldsymbol{\gamma}_j\) against the traits. In reality, it is more complicated, because \(\boldsymbol{z}\) and \(\boldsymbol{\gamma}_j\) are estiamted in the model, so their uncertainty needs to be included.

The mathematical details are hidden away here, for those who want them.

We denote the abundance at site \(i = 1 \dots n\) for species \(j = 1 \dots p\) as \(Y_{ij}\), or as a matrix as \(\boldsymbol{Y}\). The environmental predictors are \(X_{ik}\) for the \(k = 1 \ldots K\) predictors (or \(\boldsymbol{X}\) as a matrix), and \(t = 1\ldots T\) traits as \(\boldsymbol{R}\). We then assume

\[Y_{ij} \sim Pois(\lambda_{ij})\],

with

\[log(\lambda_{ij}) = \eta_{ij}\].

So that \(\eta_{ij}\) is the linear predictor, which we will model with the hierarchical ordination:

\[ \eta_{ij} = \beta_{0j} + \boldsymbol{z}_i^\top \boldsymbol{\Sigma} \boldsymbol{\gamma}_j \]

We then model \(\boldsymbol{z}_i\) (as in van der Veen et al (2023)) and \(\boldsymbol{\gamma}_j\) hierarchically:

\[ \boldsymbol{z}_i = \boldsymbol{B}^\top\boldsymbol{x}_i + \boldsymbol{\epsilon}_i \] and

\[ \boldsymbol{\gamma}_j = \boldsymbol{\omega}^\top\boldsymbol{R}_{j} + \boldsymbol{\varepsilon}_j \] where

  • \(x_{ik}\) is the \(k^{th}\) predictor (i.e. environmental effect) at site \(i\)
  • \(\boldsymbol{B}\) with entry \(b_{kq} \sim \mathcal{N}(0,1)\) is the effect of the \(k^{th}\) predictor on the site score for the \(q^{th} = 1\ldots d\) latent variable
  • \(\boldsymbol{\epsilon}_i\) with entry \(\epsilon_{iq} \sim \mathcal{N}(0, \sigma^2_q)\) is a vector of residuals for the unexplained part of the site score
  • \(tr_{jt}\) is the \(t^{th}\) predictor (i.e. trait) for species \(j\)
  • \(\boldsymbol{\omega}_t\) with entry \(\omega_{tq}\) is the effect of the \(t^{th}\) trait on the species loading for the \(q^{th}\) latent variable
  • \(\boldsymbol{\varepsilon}_j\) with entry \(\varepsilon_{jq} \sim \mathcal{N}(0, \delta^2_q)\) is a vector of residuals for the unexplained part of the species loading
Note that the predictors are all standardized to zero mean, variance one. We additionally place exponential priors on the scale parameters \(\boldsymbol{\sigma}\) and \(\boldsymbol{\delta}\) with rate parameter one.

Implementation

We fit the model with the \(\texttt{Nimble}\) package. We start with a single dimension for simplicity, so that we can show the steps needed.

One dimensional ordination

This is the code for the model with one dimension:

OneLV.nim <- nimbleCode({
  for (i in 1:NSites) {
    for (j in 1:NSpecies) {
# These three lines specify the model:   
      Y[i, j] ~ dpois(lambda[i, j]) # Likelihood: Y follows a Poisson distribution
      log(lambda[i, j]) <- eta[i, j] # Specify the log link function
      eta[i,j] <- beta0[j] + gammas[j]*LVsd*zs[i] # linear predictor: species + LV
    }
# Site scores: regression against environmental effects
#    the Bs are the coefficients of the environmental effects
      epsilon[i] ~ dnorm(0, sd = Sitesd) # Residual site effect
      XB[i] <- inprod(X[i, 1:NEnv],B[1:NEnv])
      z[i] <- XB[i] + epsilon[i]
  }
  
  for(j in 1:NSpecies) {
# Species effects: regression against trait effects
#    The Os are the trait effects.
      omegaTR[j] <- inprod(TR[j, 1:NTraits],O[1:NTraits])
      varepsilon[j] ~ dnorm(0, sd = Speciesd) # Residual
      gamma[j] <- omegaTR[j] + varepsilon[j]
      beta0[j] ~ dnorm(0, sd = 1) # S
  }
  
# Here we standardise z and gamma, so the both have a variance of 1.
#   The standard deviation of the latent variable is LVsd.
    zmu <- mean(z[1:NSites])
    zs[1:NSites] <- (z[1:NSites]-zmu)/sqrt(mean((z[1:NSites] - zmu)^2)) #scale z to unit sd and center
    gammamu<- mean(gamma[1:NSpecies])
    gammas[1:NSpecies] <- (gamma[1:NSpecies]-gammamu)/sqrt(mean((gamma[1:NSpecies] - gammamu)^2)) #scale gamma to unit sd and center
    # priors for scales
    Sitesd ~ dexp(1)
    Speciesd ~ dexp(1)
    LVsd ~ dexp(1)

    # Sign constraint
    for(k in 1:NEnv){
      B[k] ~ dnorm(0, sd = 1)  
    }
    for(tr in 1:NTraits){
      O[tr] ~ dnorm(0, sd = 1)
    }
})

These models are notorious for being unidentifiable, i.e. you can get the same mean abundances from different combinations of the parameters. We therefore have to make some adjustments to the model: some of this is done in the model fitting, but for others it is easier to do it afterwards.

The details of what we do to make this identifiable are here
As formulated above, the model is unidentifiable (as latent variable models usually are). First, we standardise \(\boldsymbol{z}_i\) and \(\boldsymbol{\gamma}_j\) to unit variance per latent variable to prevent scale invariance. Note that at this point the model is still invariance to sign switching. Dealing with sign switching is a bit messy. We could solve it by placing a truncated normal prior on the main diagonal entries of \(\boldsymbol{B}\) or \(\boldsymbol{\omega}\), but that still results in a bimodal posterior distribution for other coefficients. Hence, it is easier to leave this as it is and post-process the MCMC chains so that the diagonal entries of \(\boldsymbol{B}\) are always positive. Note however, that for \(d \gt p\) the model is still invariant to sign switches, so if the number of latent variables is greater than the number of predictors, additional constraints need to be imposed on the diagonal of \(\boldsymbol{\omega}\) or on the \(\boldsymbol{\epsilon}_i\)’s or \(\boldsymbol{\epsilon}_j\)’s instead.
We need a function to generate initial values, but tis is boring
inits<-function(consts){
  B = rnorm(consts$NEnv)
  O = rnorm(consts$NTraits)
  varepsilon = rnorm(consts$NSpecies)
  epsilon = rnorm(consts$NSites)
  list(
    B = B,
    O = O,
    epsilon = epsilon,
    varepsilon = varepsilon,
    Sitesd = rexp(1),
    Speciesd = rexp(1),
    beta0 = rnorm(consts$NSpecies),
    LVsd = rexp(1)
  )
}

Because we are going to parallelise the model fitting, we put the code to do this in a function:

Functions to run the code are here. They may be overkill, but
# Function to run one chain: it can be done with HMC or other MCMC algorithms.
RunOneChain <- function(seed, dat, code, inits, consts, Nburn=5e3, NIter=5.5e4, Nthin=10, HMC = FALSE, ...) {
  require(nimble)
  require(nimbleHMC)
  Mons <- c("beta0","Speciesd","Sitesd","LVsd","B","O","epsilon","varepsilon","beta0")
  if(HMC) { # HMC everything
    res <- nimbleHMC(code = code, constants = consts, 
                       inits = inits(consts), data = dat, monitors = Mons, 
                       niter=NIter, nburnin = Nburn, thin=Nthin, nchains = 1,
                       samplesAsCodaMCMC = TRUE)
  } else {
    mod <- nimbleModel(code = code, name = "HO", constants = consts, inits = inits(consts),
                       data = dat)
    model <- compileNimble(mod)
    conf <- configureMCMC(model, monitors = Mons, print = FALSE)
    # Change the samplers to block update, using a nifty extension of the slice sampler
    conf$removeSamplers(c('epsilon', 'varepsilon'))
    conf$addSampler(target = 'epsilon', type = "AF_slice")
    conf$addSampler(target = 'varepsilon', type = "AF_slice")
    mcmc <- buildMCMC(conf)
    cmcmc <- compileNimble(mcmc, project = model)
    res <- runMCMC(cmcmc,  niter=NIter, nburnin = Nburn, thin=Nthin, 
                   nchains = 1, samplesAsCodaMCMC = TRUE, ...)
    
  }
  return(res)
}

# Function to run parallel chains
ParaNimble <- function(NChains, ...) {
  require(parallel)
  nimble_cluster <- makeCluster(4)
  
  samples <- parLapply(cl = nimble_cluster, X = 1:4,...)
  stopCluster(nimble_cluster)

  # Name the chains in the list
  chains <- setNames(samples,paste0("chain", 1:length(samples)))
  chains
}

Now we can run the MCMC.

chains.unsgn <- ParaNimble(4, fun = RunOneChain,
                       dat = dat,
                       code = OneLV.nim,
                       inits = inits, HMC = TRUE,
#                       Nburn=5e1, NIter=5.5e2, Nthin=1, # for a small run
                       Nburn=1e3, NIter=2e3, Nthin=1, # for a big HMC run
                       consts = consts)
## Loading required package: parallel

Results

There are a lot of parameters, so a lot of results to look at. Here, we present the traceplots of \(\boldsymbol{B}\), \(\boldsymbol{\omega}\), \(\boldsymbol{\sigma}\) and \(\boldsymbol{\delta}\), and a latent variable-plot.

We need to change some sign constraints: the function for that is here
# Function to apply sign constraints to one set of parameters
# iter is one iteration of chain x that we need to correct for sign
ProcessOneDraw <- function(iter, Consts){
  if(is.null(Consts$nLVs)) Consts$nLVs <- 1
  # we start by separating into the different components, B,  O, epsilon, varepsilon
  bs <- iter[grep("^B", names(iter))]
  os <- iter[grep("^O", names(iter))]
  varepsilon <- matrix(iter[grep("^varepsilon", names(iter))],
                       ncol=Consts$nLVs, nrow=Consts$NSpecies)
  # epsilons, not to confuse with the varepsilons
  epsilon <- matrix(iter[grep("^epsilon", names(iter))],
                       ncol=Consts$nLVs, nrow=Consts$NSites)
  # to matrix
  B <- matrix(bs,ncol=Consts$nLVs,nrow=Consts$NEnv)
  O <- matrix(os,ncol=Consts$nLVs,nrow=Consts$NTraits)
  
  # get sign of main diagonalentries for B
  signs <- sign(diag(B))
  if(all(signs>0)){ #if these are all positive, no sign swapping is needed, so  we're done
    return(iter)
  }else{
    sign.mat = diag(signs, ncol= Consts$nLVs,nrow=Consts$nLVs)
    # otherwise, we need to swap!
    B = B%*%sign.mat
    O = O %*% sign.mat
    epsilon = epsilon %*% sign.mat
    varepsilon = varepsilon %*% sign.mat
    # now put the sign-swapped coefs back
    iter[grep("^B", names(iter))] <- c(B)
    iter[grep("^O", names(iter))] <- c(O)
    iter[grep("^epsilon", names(iter))] <- c(epsilon)
    iter[grep("^varepsilon", names(iter))] <- c(varepsilon)
    #aaand to the next iteration
    return(iter)
  }
}

# Function to process a chain, looking over the iterations, to ensure identifiability
postProcess <- function(ch, konsts, progress=FALSE){
  require(coda)
  # chains should be a list of length nchains
  total <- length(ch)
  if(progress) pb <- txtProgressBar(min = 0, max = total, style = 3)
  i = 0
  lst <- lapply(ch, function(x, Cnsts){
    res <- t(apply(as.matrix(x),1,ProcessOneDraw, Consts = Cnsts))
    i <<- i+1
    # update progress bar
    if(progress) setTxtProgressBar(pb, i)
    # restore parameter names
    colnames(res) <- colnames(x)
    return(as.mcmc(res))
  }, Cnsts = konsts)
 as.mcmc.list(lst)
}

Nowe we can look at some lovely chains, to see that they have converged. First the environmental effects

# post-process chains for sign-swapping
chains <- postProcess(chains.unsgn, konsts = consts)
summ.HO = summary(chains)

library(basicMCMCplots)
par(mfrow=c(1,1))
chainsPlot(chains, var = c("B"), legend = F,traceplot=T)

Then the trait effects

chainsPlot(chains, var = c("O"), legend = F,traceplot=TRUE)

These look quite OK (after post-processing). Now we can create a plot of the LV-scores against their indices to inspect the results:

ExtractMeans <- function(summ, name, d=1) {
  if(is.null(d)) d <- 1
  var <- summ$statistics[grep(paste0('^', name), rownames(summ$statistics)),"Mean"]
  matrix(var,ncol=d)
}

# First a plot of the site scores
eps.m <- ExtractMeans(summ.HO, name="epsilon", d=consts$nLVs)
B.m <- ExtractMeans(summ.HO, name="B", d=consts$nLVs)
LVs <- X%*%B.m+eps.m

# Now the species loadings
vareps.m <- ExtractMeans(summ.HO, name="varepsilon", d=consts$nLVs)
O.m <- ExtractMeans(summ.HO, name="O", d=consts$nLVs)
gamma <- TR%*%O.m+vareps.m

par(mfrow=c(1,2))
plot(LVs,type="n", xlab="Sites", ylab="Latent variable 1", main = "Sites")
text(LVs,labels=1:consts$NSites)

plot(gamma,type="n", xlab="Species", ylab="Latent variable 1", main = "Species")
text(gamma,labels=vegan::make.cepnames(colnames(Y)))

More dimensions

More than one dimension requires adding extra identifiability constraints. Now we have two or more latent variables, we also have to worry about rotating them.

For those who are interested, this is how we deal with the rotations

There are various ways to place the constraints, some more numerically stable than others. Generally, we note that the model on the link scale is similar to existing matrix decompositions, so that much on the required constraints can be learned from those.

To prevent rotational invariance, we require \(\boldsymbol{B}\) and \(\boldsymbol{\omega}\) to have zeros above the main diagonal. van der Veen et al (2023) instead assumed \(\boldsymbol{B}\) to be a semi-orthogonal matrix as they encountered numerical instability with the constraint that we impose here. However, their constraint would require sampling on constrained parameter spaces, which is difficult, while this constraint formulation allows to use standard out-of-the-box Markov Chain Monte carlo samplers (also, it seems to work fine here). Note, that when $d p,t $ the model is again rotationally invariant, and additional constraints (e.g., order constraints for the latent variables as in a varimax rotation) will need to be added.

The new model code is almost the same as before:

# Update our constants from before with the new number of LVs, rest remains the same
consts$nLVs <- nLVs <- 2
# Model code
HO.nim <- nimbleCode({
  for (i in 1:NSites) {
    for (j in 1:NSpecies) {
      eta[i,j] <- beta0[j] + sum(gammas[j,1:nLVs]*LVsd[1:nLVs]*zs[i,1:nLVs])
      log(lambda[i, j]) <- eta[i, j]
      Y[i, j] ~ dpois(lambda[i, j])
    }      
    for (q in 1:nLVs) {
      XB[i, q] <- sum(X[i, 1:NEnv]*B[1:NEnv, q])
      epsilon[i,q] ~ dnorm(0,Sitesd[q])#Residual
      z[i,q] <- XB[i,q] + epsilon[i,q]
    }
  }
  
  for(j in 1:NSpecies) {
    for (q in 1:nLVs) {
      omegaTR[j, q] <- sum(TR[j, 1:NTraits]*O[1:NTraits, q])
      varepsilon[j,q] ~ dnorm(0,Speciesd[q]) # Residual
      gamma[j,q] <- omegaTR[j,q] + varepsilon[j,q]
    }
    
    beta0[j] ~ dnorm(0, sd=1)
  }
  # Constraints to 0 on upper diagonal
  # stole some code from Boral for this - thanks Francis
  for(i in 1:(nLVs-1)) { 
    for(j in (i+1):(nLVs)) {
      B[i,j] <- 0 
      O[i,j] <- 0
    } 
  }
  
  for(i in 1:nLVs) { 
    # diagonal elements
    B[i,i] ~ dnorm(0,1)
    O[i,i] ~ dnorm(0,1)#T(dnorm(0,1),0,Inf)
    ## standardizing z and gamma
    zmu[i] <- mean(z[1:NSites,i])
    zs[1:NSites,i] <- (z[1:NSites,i]-zmu[i])/sqrt(mean((z[1:NSites,i] - zmu[i])^2)) #scale z to unit sd and center
    gammamu[i] <- mean(gamma[1:NSpecies,i])
    gammas[1:NSpecies,i] <- (gamma[1:NSpecies,i]-gammamu[i])/sqrt(mean((gamma[1:NSpecies,i] - gammamu[i])^2)) #scale gamma to unit sd and center
    # priors for scales
    Sitesd[i] ~ dexp(1)
    Speciesd[i] ~ dexp(1)
    LVsd[i] ~ dexp(.1)
    } 
  
  ## Free lower diagonals
  for(i in 2:nLVs) { 
    for(j in 1:(i-1)) { 
      B[i,j] ~ dnorm(0,1)
      O[i,j] ~ dnorm(0,1)
      } 
    }
  for(i in (nLVs+1):NEnv) { for(j in 1:(nLVs)) { B[i,j] ~dnorm(0,1) } } ## All other elements
  for(i in (nLVs+1):NTraits) { for(j in 1:(nLVs)) { O[i,j] ~dnorm(0,1) } } ## All other elements

})
We create a new function to simulate starting values from the prior distributions, which we can hide away
inits<-function(consts){
  B = matrix(rnorm(consts$nLVs*consts$NEnv),ncol=consts$nLVs)
  B[upper.tri(B)] = 0
  O = matrix(rnorm(consts$nLVs*consts$NTraits),nrow=consts$NTraits)
  O[upper.tri(O)] = 0
  varepsilon = mvtnorm::rmvnorm(consts$NSpecies,rep(0,consts$nLVs),diag(consts$nLVs))
  epsilon = mvtnorm::rmvnorm(consts$NSites,rep(0,consts$nLVs),diag(consts$nLVs))
  list(
    B = B,
    O = O,
    epsilon = epsilon,
    varepsilon = varepsilon,
    Sitesd = rexp(consts$nLVs),
    Speciesd = rexp(consts$nLVs),
    beta0 = rnorm(consts$NSpecies),
    LVsd = rexp(consts$nLVs)
  )
}

And run it:

HO.2LV <- ParaNimble(4, fun = RunOneChain,
                       dat = dat,
                       code = HO.nim,
                       inits = inits, HMC=TRUE,
#                       Nburn=5e1, NIter=5.5e2, Nthin=1, # for a small run
    # niter=1e6, nburnin = 5e4, thin=1e2,nchains = 1, 
                       Nburn=1e3, NIter=4e3, Nthin=1, # for a small run
                       consts = consts)

Results

Again, check they work. Environment first:

# post-process chains for sign-swapping
chains2LV <- postProcess(HO.2LV, konsts = consts)
summ.HO2LV = summary(chains2LV)

chains2LV <- postProcess(HO.2LV, konsts = consts)

par(mfrow=c(1,1))
chainsPlot(chains2LV, var = c("B"), legend = F, traceplot=TRUE)

And the trait effects:

chainsPlot(chains2LV, var = c("O"), legend = F, traceplot=TRUE)

The traceplots look quite OK. Now, we can make two-dimensional ordination plots of sites and species, with their predictor effects.

We need a function to plot the arrows, which is hidden here
AddArrows <- function(coords, marg= par("usr"), col=2) {
  origin <- c(mean(marg[1:2]), mean(marg[3:4]))
  Xlength <- sum(abs(marg[1:2]))/2
  Ylength <- sum(abs(marg[3:4]))/2
  ends <- coords / max(abs(coords)) * min(Xlength, Ylength) * .8
  arrows(
    x0 = origin[1],
    y0 = origin[2],
    x1 = ends[,
              1] + origin[1],
    y1 = ends[, 2] + origin[2],
    col = col,
    length = 0.1)
  text(
    x = origin[1] + ends[, 1] * 1.1,
    y = origin[2] + ends[, 2] * 1.1,
    labels = rownames(coords),
    col = col)
  
}
# First a plot of the site scores
eps.m <- ExtractMeans(summ.HO2LV, name="epsilon", d=consts$nLVs)
B.m <- ExtractMeans(summ.HO2LV, name="B", d=consts$nLVs)
rownames(B.m) <- colnames(X)
LVs <- X%*%B.m+eps.m

# Now the species loadings
vareps.m <- ExtractMeans(summ.HO2LV, name="varepsilon", d=consts$nLVs)
O.m <- ExtractMeans(summ.HO2LV, name="O", d=consts$nLVs)
rownames(O.m) <- colnames(TR)
gamma <- TR%*%O.m+vareps.m


par(mfrow=c(1,2))

#LVs
plot(LVs,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Sites")
text(LVs,labels=1:consts$NSites)
AddArrows(marg = par("usr"), coords=B.m, col="red")

#gammas
plot(gamma,type="n", xlab="Latent variable 1", ylab="Latent variable 2", main = "Species")
text(gamma,labels=vegan::make.cepnames(colnames(Y)))
AddArrows(marg = par("usr"), coords=O.m, col="blue")
## Warning in arrows(x0 = origin[1], y0 = origin[2], x1 = ends[, 1] + origin[1], :
## zero-length arrow is of indeterminate angle and so skipped

We can also look at some summary statistics for the scale parameters of the latent variables, species-specific residuals and site-specific residuals, respectively:

SDs <- summ.HO2LV$statistics[grep("sd", rownames(summ.HO2LV$statistics)),c("Mean", "SD")]
knitr::kable(SDs, digits=2, format = "html", table.attr = "style='width:30%;'")
Mean SD
LVsd[1] 0.73 0.13
LVsd[2] 0.73 0.13
Sitesd[1] 0.45 0.44
Sitesd[2] 0.48 0.48
Speciesd[1] 0.04 0.05
Speciesd[2] 0.07 0.09

These tell us how well the predictors explain the ordination; the species- and site-specific scale parameters would be zero if the predictors fully explained the ordination. The scale parameters for the latent variables are similar to singular values (the square root of eigenvalues) in a classical ordination; they reflect a dimension its importance to the response.

Residual covariance matrix

The covariance of species is determined by traits, LV-specific variation and sites’ residual variation, and the covariance between sites is determined by the environment, LV-specific variation and species’ residual variation.

The maths behind this is covered in here

The residual covariance matrix of the model (where you calculate species associations from) is determined by three terms:

  1. \(\boldsymbol{X}\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j \sim \mathcal{N}(0,\boldsymbol{X}\boldsymbol{B}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{X}^\top)\)
  2. \(\boldsymbol{TR}\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i\sim \mathcal{N}(0,\boldsymbol{T}\boldsymbol{\omega}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{T}^\top)\)
  3. \(\boldsymbol{\epsilon}_i^\top\boldsymbol{\varepsilon}_j \sum \limits^{d}_{q=1}\Sigma_{q,q}\sigma_q\delta_q\sim \mathcal{K}_0(\sigma^{-1}_q\delta^{-1}_q\vert\epsilon_{iq}\varepsilon_{iq}\vert)\),

where \(\mathcal{K}_0\) denotes the zero order modified Bessel function of the second kind. The first term tells us that correlations between species are determined by the environment. The second term tells us that the correlation between sites is determined by species traits. The last term induces no correlation between sites and species for diagonal \(\boldsymbol{\Sigma}\), but serves to scale species associations. Consequently, the covariance for species \(j,j2\) and site \(i,i2\) is: \[\begin{multline} \Sigma_{i,i2,j,j2} = \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \text{cov}(\boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2}) + \\ \text{cov}(\boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j},\boldsymbol{x}_{i2}^\top\boldsymbol{B}\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) + \text{cov}(\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j},\boldsymbol{tr}_{j2}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\boldsymbol{\epsilon}_{i2})+ \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}), \end{multline}\] third order terms are zero for central normal random variables, so this simplifies to: \[\begin{equation} \Sigma_{i,i2,j,j2} = \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2}+ \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \boldsymbol{tr}_j^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2})\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}). \end{equation}\] Here, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}) = \text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\varepsilon}_{j2}) = 0\), and for \(\text{cov}(\boldsymbol{\epsilon}_i^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i2}^\top\boldsymbol{\Sigma}\boldsymbol{\varepsilon}_{j2}) = 2\text{tr}(\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma})\). Further, \(\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2})\) is zero for \(j \neq j2\) and \(\text{diag}(\boldsymbol{\delta}^2)\) otherwise, and similar for \(\text{cov}(\boldsymbol{\epsilon}_i,\boldsymbol{\epsilon}_{i2})\) .

Consequently, for a species association matrix where we consider the block of the covariance matrix where \(i = i2\), or for the site-to-site matrix where we consider the block \(j = j2\), we have:

\[\begin{split} \Sigma_{j,j2} &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \text{cov}(\boldsymbol{\epsilon}_i^\top \boldsymbol{\Sigma}\boldsymbol{\varepsilon}_j,\boldsymbol{\epsilon}_{i}^\top \boldsymbol{\Sigma} \boldsymbol{\varepsilon}_{j2}) + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{var}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2} \\ &= \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\varepsilon}_j,\boldsymbol{\varepsilon}_{j2}) \boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i} + \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j2}+ 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}, \end{split}\]

and

\[\begin{equation} \Sigma_{i,i2} = \boldsymbol{tr}_{j}^\top\boldsymbol{\omega}\boldsymbol{\Sigma}\text{cov}(\boldsymbol{\epsilon}_{i},\boldsymbol{\epsilon}_{i2})\boldsymbol{\Sigma}\boldsymbol{\omega}^\top\boldsymbol{tr}_{j} + \boldsymbol{x}_i^\top\boldsymbol{B}\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\boldsymbol{B}^\top\boldsymbol{x}_{i2} + 2\text{tr}\{\text{diag}(\boldsymbol{\delta}^2)\boldsymbol{\Sigma}\text{diag}(\boldsymbol{\sigma}^2)\boldsymbol{\Sigma}\}. \end{equation}\]

We can visualize those matrices here for (e.g.,) the first site and first species, respectively.

#species
spec.mat <- matrix(0,nrow=NSpecies,ncol=NSpecies)
Sigma <- diag(SDs[grep("LV", rownames(SDs)),1]^2)
Sigma.sp <- diag(SDs[grep("Specie", rownames(SDs)),1]^2)
Sigma.si <- diag(SDs[grep("Site", rownames(SDs)),1]^2)

for(j in 1:NSpecies){
  for(j2 in 1:NSpecies){
    if(j==j2){
      spec.mat[j,j2] = X[1,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%t(B.m)%*%t(X[1,,drop=F])
    }
    spec.mat[j,j2] = spec.mat[j,j2] + TR[j,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%Sigma%*%t(O.m)%*%t(TR[j2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
  }
}

spec.cor.mat <- cov2cor(spec.mat)
colnames(spec.cor.mat) <- row.names(spec.cor.mat) <- colnames(Y)
corrplot::corrplot(spec.cor.mat,type = "lower",order = "AOE", main = "Species", mar = c(1,1,1,1),tl.srt=45,tl.cex = .5)

#sites
site.mat <- matrix(0,nrow=NSites,ncol=NSites)

for(i in 1:NSites){
  for(i2 in 1:NSites){
    if(i==i2){
      site.mat[i,i2] = TR[1,,drop=F]%*%O.m%*%Sigma%*%Sigma.si%*%t(O.m)%*%t(TR[1,,drop=F])
    }
    site.mat[i,i2] = site.mat[i,i2] + X[i,,drop=F]%*%B.m%*%Sigma%*%Sigma.sp%*%Sigma%*%t(B.m)%*%t(X[i2,,drop=F]) + 2*sum(diag((Sigma.sp%*%Sigma%*%Sigma.si%*%Sigma)))
  }
}

site.cor.mat <- cov2cor(site.mat)
colnames(site.cor.mat) <- row.names(site.cor.mat) <- paste("Site", 1:NSites)
corrplot::corrplot(site.cor.mat,type = "lower",order = "AOE", main = "Sites", mar = c(3,3,3,3),tl.srt=45,tl.cex = .5)